A Mean Deviation Based Splitting Criterion for Classification Tree

نویسندگان

  • Syed Jawad
  • Ali Shah
  • Syed Jawad Ali Shah
  • Qamruz Zaman
چکیده

For the learning of Classification Tree, many researchers have used different splitting criteria, in which most commonly impurity-based criteria are: Gini index, Entropy function and Exponent-based index. By comparing Misclassification rates, none of the splitting criterion can be declared as providing best results in every situation. In this study, a new Mean Deviation based index has been proposed and a simulation study is designed, to explore which measure gives best result in what type of situation? From simulation study, it is concluded that generally new proposed M.D-based index and Exponent-based index give excellent results in case of imbalanced data. While, in case of balanced data, Gini index and Entropy function have less misclassification rates. Keywords— "Classification Tree", "Misclassification rate", "Impurity-based criteria", "Simulation", "Imbalanced data"

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessing Behavioral Patterns of Motorcyclists Based on Traffic Control Device at City Intersections by Classification Tree Algorithm

According to the forensic statistics, in Iran, 26 percent of those killed in traffic accidents are motorcyclists in recent years. Thus, it is necessary to investigate the causes of motorcycle accidents because of the high number of motorcyclist casualties. Motorcyclists' dangerous behaviors are among the causes of events that are discussed in this study. Traffic signs have the important role of...

متن کامل

Families of splitting criteria for classification trees

Several splitting criteria for binary classification trees are shown to be written as weighted sums of two values of divergence measures. This weighted sum approach is then used to form two families of splitting criteria. One of them contains the chi-squared and entropy criterion, the other contains the mean posterior improvement criterion. Both family members are shown to have the property of ...

متن کامل

The Comparison of Gini and Twoing Algorithms in Terms of Predictive Ability and Misclassification Cost in Data Mining: An Empirical Study

The classification tree is commonly used in data mining for investigating interaction among predictors, particularly. The splitting rule and the decision trees technique employ algorithms that are largely based on statistical and probability methods. Splitting procedure is the most important phase of classification tree training. The aim of this study is to compare Gini and Twoing splitting rul...

متن کامل

Comparison of Machine Learning Algorithms for Broad Leaf Species Classification Using UAV-RGB Images

Abstract: Knowing the tree species combination of forests provides valuable information for studying the forest’s economic value, fire risk assessment, biodiversity monitoring, and wildlife habitat improvement. Fieldwork is often time-consuming and labor-required, free satellite data are available in coarse resolution and the use of manned aircraft is relatively costly. Recently, unmanned aeria...

متن کامل

Comparing different stopping criteria for fuzzy decision tree induction through IDFID3

Fuzzy Decision Tree (FDT) classifiers combine decision trees with approximate reasoning offered by fuzzy representation to deal with language and measurement uncertainties. When a FDT induction algorithm utilizes stopping criteria for early stopping of the tree's growth, threshold values of stopping criteria will control the number of nodes. Finding a proper threshold value for a stopping crite...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015